`
Load the tweets and check if they are loaded correctly. We also check the summary for a first interpretation. The summary(tweets) output reveals the following:
# Set working directory
# getwd()
# setwd("./data/")
# Load data
load("./Tweets_all.rda")
# Check that tweets are loaded
head(tweets)
## # A tibble: 6 × 14
## created_at id id_str full_text in_reply_to_screen_n…¹
## <dttm> <dbl> <chr> <chr> <chr>
## 1 2023-01-20 17:17:32 1.62e18 1616469988369469… "Im MSc … <NA>
## 2 2023-01-13 07:52:01 1.61e18 1613790954737074… "Was bew… <NA>
## 3 2023-01-12 19:30:01 1.61e18 1613604227141537… "Was uns… <NA>
## 4 2023-01-12 08:23:00 1.61e18 1613436367169634… "Eine di… <NA>
## 5 2023-01-11 14:00:05 1.61e18 1613158809081450… "Wir gra… <NA>
## 6 2023-01-10 17:06:11 1.61e18 1612843252083834… "Unsere … <NA>
## # ℹ abbreviated name: ¹in_reply_to_screen_name
## # ℹ 9 more variables: retweet_count <int>, favorite_count <int>, lang <chr>,
## # university <chr>, tweet_date <dttm>, tweet_minute <dttm>,
## # tweet_hour <dttm>, tweet_month <date>, timeofday_hour <chr>
summary(tweets)
## created_at id id_str
## Min. :2009-09-29 14:29:47.0 Min. :4.469e+09 Length:19575
## 1st Qu.:2015-01-28 15:07:41.5 1st Qu.:5.604e+17 Class :character
## Median :2018-04-13 13:26:56.0 Median :9.848e+17 Mode :character
## Mean :2017-12-09 15:26:50.7 Mean :9.400e+17
## 3rd Qu.:2020-10-20 10:34:50.0 3rd Qu.:1.318e+18
## Max. :2023-01-26 14:49:31.0 Max. :1.619e+18
## full_text in_reply_to_screen_name retweet_count favorite_count
## Length:19575 Length:19575 Min. : 0.000 Min. : 0.00
## Class :character Class :character 1st Qu.: 0.000 1st Qu.: 0.00
## Mode :character Mode :character Median : 1.000 Median : 0.00
## Mean : 1.289 Mean : 1.37
## 3rd Qu.: 2.000 3rd Qu.: 2.00
## Max. :267.000 Max. :188.00
## lang university tweet_date
## Length:19575 Length:19575 Min. :2009-09-29 00:00:00.00
## Class :character Class :character 1st Qu.:2015-01-28 00:00:00.00
## Mode :character Mode :character Median :2018-04-13 00:00:00.00
## Mean :2017-12-09 02:25:45.00
## 3rd Qu.:2020-10-20 00:00:00.00
## Max. :2023-01-26 00:00:00.00
## tweet_minute tweet_hour
## Min. :2009-09-29 14:29:00.00 Min. :2009-09-29 14:00:00.00
## 1st Qu.:2015-01-28 15:07:00.00 1st Qu.:2015-01-28 14:30:00.00
## Median :2018-04-13 13:26:00.00 Median :2018-04-13 13:00:00.00
## Mean :2017-12-09 15:26:24.68 Mean :2017-12-09 14:59:43.81
## 3rd Qu.:2020-10-20 10:34:30.00 3rd Qu.:2020-10-20 10:00:00.00
## Max. :2023-01-26 14:49:00.00 Max. :2023-01-26 14:00:00.00
## tweet_month timeofday_hour
## Min. :2009-09-01 Length:19575
## 1st Qu.:2015-01-01 Class :character
## Median :2018-04-01 Mode :character
## Mean :2017-11-24
## 3rd Qu.:2020-10-01
## Max. :2023-01-01
Start preprocessing the tweets, to calculate the intervalls some additional properties are needed. The preprocessing steps transform raw tweet data into a structured format suitable for analysis. This includes:
# Preprocessing Step: Convert date and time to POSIXct and format according to date, year and university. Detect language and extract emojis. The days are sorted from the system locale starting from monday
tweets <- tweets %>%
mutate(
created_at = as.POSIXct(created_at, format = "%Y-%m-%d %H:%M:%S"),
date = as.Date(created_at),
day = lubridate::wday(created_at,
label = TRUE, abbr = FALSE,
week_start = getOption("lubridate.week.start", 1),
locale = Sys.getlocale("LC_TIME")
),
year = year(created_at),
month = floor_date(created_at, "month"),
university = as.character(university),
full_text_emojis = replace_emoji(full_text, emoji_dt = lexicon::hash_emojis)
)
# Remove Emoji Tags helper funciton
# replace emoji places the emojis in the text as tags and their name, we remove them here
remove_emoji_tags <- function(text) {
str_remove_all(text, "<[a-z0-9]{2}>")
}
# Remove Emoji Tags
tweets$full_text_emojis <- sapply(tweets$full_text_emojis, remove_emoji_tags)
# Store emojis in a sep arate column to analyze later
tweets$emoji_unicode <- tweets %>%
emoji_extract_nest(full_text) %>%
select(.emoji_unicode)
Each university has a distinct peak hour for tweeting, often aligning with typical working hours (9 AM - 5 PM). This suggests a strategic approach to reach their target audience when they are most likely online. The most active hours for each university are as follows:
These times typically align with standard working hours, indicating a strategic approach to reach their audience during times they are most likely to be online. It appears that a typical worker is more productive and active on Twitter in the morning, with motivation waning around midday and continuing to decline until the end of the workday.
There isn’t a consistent “most active day” across universities. Some favor weekdays, while others show higher activity on weekends. This could reflect differences in their target audience or the nature of their content.
The pattern also suggests that tweet activity tends to be higher earlier in the week, with motivation and tweet frequency potentially falling as the week progresses.
# Count each tweet by university and hour of the day
tweet_counts_by_hour_of_day <- tweets %>%
group_by(university, timeofday_hour) %>%
count() %>%
arrange(university, timeofday_hour)
# Plot the number of tweets by university and hour of the day
ggplot(
tweet_counts_by_hour_of_day,
aes(
x = timeofday_hour, y = n,
color = university, group = university
)
) +
geom_line() +
facet_wrap(~university) +
labs(
title = "Number of tweets by university and hour",
x = "Hour of day",
y = "Number of tweets"
)
# Show most active hours for each university
hours_with_most_tweets_by_uni <- tweet_counts_by_hour_of_day %>%
group_by(university, timeofday_hour) %>%
summarize(total_tweets = sum(n)) %>%
group_by(university) %>%
slice_max(n = 1, order_by = total_tweets)
print(hours_with_most_tweets_by_uni)
## # A tibble: 8 × 3
## # Groups: university [8]
## university timeofday_hour total_tweets
## <chr> <chr> <int>
## 1 FHNW 09 344
## 2 FH_Graubuenden 11 493
## 3 ZHAW 17 580
## 4 bfh 08 497
## 5 hes_so 10 315
## 6 hslu 09 380
## 7 ost_fh 08 44
## 8 supsi_ch 11 330
# Show most active hour overall
hour_with_most_tweets <- tweet_counts_by_hour_of_day %>%
group_by(timeofday_hour) %>%
summarize(total_tweets = sum(n)) %>%
arrange(desc(total_tweets)) %>%
slice_max(n = 1, order_by = total_tweets)
print(hour_with_most_tweets)
## # A tibble: 1 × 2
## timeofday_hour total_tweets
## <chr> <int>
## 1 11 2356
# Count each tweet by university and weekday
tweet_counts_by_week_day <- tweets %>%
group_by(university, day) %>%
count() %>%
arrange(university, day)
# Plot the number of tweets by university and day of the week
ggplot(
tweet_counts_by_week_day,
aes(
x = day, y = n,
color = university,
group = university
)
) +
geom_line() +
facet_wrap(~university) +
labs(
title = "Number of tweets by university and day of the week",
x = "Day of the week",
y = "Number of tweets"
)
# Show most active days for each university
days_with_most_tweets_by_uni <- tweet_counts_by_week_day %>%
group_by(university, day) %>%
summarize(total_tweets = sum(n)) %>%
group_by(university) %>%
slice_max(n = 1, order_by = total_tweets)
print(days_with_most_tweets_by_uni)
## # A tibble: 8 × 3
## # Groups: university [8]
## university day total_tweets
## <chr> <ord> <int>
## 1 FHNW Tuesday 575
## 2 FH_Graubuenden Tuesday 751
## 3 ZHAW Wednesday 636
## 4 bfh Tuesday 651
## 5 hes_so Tuesday 415
## 6 hslu Thursday 603
## 7 ost_fh Friday 65
## 8 supsi_ch Friday 461
# Combine the most active hours and days for each university to show heatmap
heatmap_data <- tweets %>%
group_by(timeofday_hour, day) %>%
count() %>%
ungroup()
# Plot heatmap and we can see clearly that the most tweets are posted during the working hours from monday to friday
ggplot(heatmap_data, aes(x = day, y = timeofday_hour, fill = n)) +
geom_tile(color = "white") +
scale_fill_gradient(low = "lightblue", high = "darkblue") +
labs(
title = "Heatmap of Tweet Activity by Hour and Day",
x = "Day",
y = "Hour",
fill = "Number of Tweets"
) +
theme_minimal()
While universities have peak hours and days, the intervals between tweets vary significantly, indicating a more reactive strategy rather than a rigid release schedule. This variability suggests that the universities might be responding to real-time events or trends rather than sticking to a strict posting schedule.
To further understand the dispersion of tweets, we analyzed the time intervals between tweets using measures like the mean interval, standard deviation, and entropy. Understanding the variability helps in assessing how consistent or sporadic the posting behavior is. High variability indicates that the university does not follow a strict schedule and posts at irregular intervals, which could be a sign of a more reactive approach to social media. Higher entropy suggests less predictability in tweet timing, indicating a more dynamic and responsive posting strategy. This is crucial for understanding how universities might be reacting to real-time events or trends rather than following a predetermined schedule.Here are the results for selected universities:
tweets <- tweets %>%
arrange(university, created_at) %>%
group_by(university) %>%
mutate(time_interval = as.numeric(
difftime(created_at, lag(created_at), units = "mins")
))
# Plotting the time intervals
universities <- unique(tweets$university)
for (uni in universities) {
uni_filtered_data <- tweets %>%
filter(university == uni)
# Plot the distribution of time intervals
print(ggplot(uni_filtered_data, aes(x = time_interval)) +
geom_histogram(fill = "lightblue", bins = 30) +
facet_wrap(~year) +
labs(
title = paste0("Distribution of time intervals between tweets - ", uni),
x = "Time interval (minutes)",
y = "Tweet count"
))
# Plot posting day for each year for university because a intervall could be "short" but when a university only posts twice a year it seems active but it is actually not
tweet_counts <- uni_filtered_data %>%
group_by(tweet_month) %>%
summarise(tweet_count = n())
print("Tweet count for each month")
print(ggplot(tweet_counts, aes(x = tweet_month, y = tweet_count)) +
geom_line(color = "#F9C301") +
geom_point(color = "#37556E", size = 3) +
scale_x_date(date_labels = "%b %Y", date_breaks = "1 month") +
labs(
title = paste0("Monthly Tweet Activity - ", uni),
x = "Month",
y = "Tweet Count"
) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)))
# Calculate dispersion measures for time intervals
dispersion_measures <- uni_filtered_data %>%
group_by(year) %>%
summarise(
mean_interval = mean(time_interval, na.rm = TRUE),
sd_interval = sd(time_interval, na.rm = TRUE),
entropy_interval = entropy::entropy(discretize(time_interval,
numBins = 30, r = range(time_interval, na.rm = TRUE)
))
)
print(paste("Dispersion measures for", uni))
print(dispersion_measures)
# Line plot for mean interval and standard deviation over the years
print(ggplot(dispersion_measures, aes(x = year)) +
geom_line(aes(y = mean_interval, color = "Mean Interval")) +
geom_point(aes(y = mean_interval, color = "Mean Interval")) +
geom_line(aes(y = sd_interval, color = "Standard Deviation")) +
geom_point(aes(y = sd_interval, color = "Standard Deviation")) +
scale_y_continuous(sec.axis = dup_axis()) +
labs(
title = paste0("Mean Interval and Standard Deviation of Difftime Over Years - ", uni),
x = "Year",
y = "Minutes",
color = "Measure"
) +
theme_minimal())
print(ggplot(dispersion_measures, aes(x = year)) +
geom_line(aes(y = entropy_interval, color = "Entropy Interval")) +
geom_point(aes(y = entropy_interval, color = "Entropy Interval")) +
scale_y_continuous(sec.axis = dup_axis()) +
labs(
title = paste0("Entropy of Tweet Intervals Over Years - ", uni),
x = "Year",
y = "Entropy"
) +
theme_minimal())
}
## [1] "Tweet count for each month"
## [1] "Dispersion measures for FHNW"
## # A tibble: 13 × 4
## year mean_interval sd_interval entropy_interval
## <dbl> <dbl> <dbl> <dbl>
## 1 2011 6994. 4215. 0.693
## 2 2012 2120. 2893. 2.01
## 3 2013 1291. 2668. 1.29
## 4 2014 3238. 4378. 2.17
## 5 2015 4695. 7894. 1.81
## 6 2016 5382. 7382. 2.21
## 7 2017 3696. 6566. 1.61
## 8 2018 1514. 3440. 0.748
## 9 2019 1657. 2226. 1.53
## 10 2020 1744. 1670. 2.36
## 11 2021 2022. 2269. 2.36
## 12 2022 1830. 1968. 2.35
## 13 2023 2160. 2580. 2.15
## [1] "Tweet count for each month"
## [1] "Dispersion measures for FH_Graubuenden"
## # A tibble: 15 × 4
## year mean_interval sd_interval entropy_interval
## <dbl> <dbl> <dbl> <dbl>
## 1 2009 2313. 3692. 2.00
## 2 2010 33181. 63719. 1.39
## 3 2011 6164. 8193. 1.97
## 4 2012 2501. 4219. 1.64
## 5 2013 3197. 5529. 1.76
## 6 2014 1254. 3569. 0.578
## 7 2015 1088. 1521. 1.98
## 8 2016 1448. 1726. 1.89
## 9 2017 2227. 2462. 2.14
## 10 2018 2384. 3034. 1.88
## 11 2019 3019. 3146. 2.38
## 12 2020 3011. 2764. 2.51
## 13 2021 2913. 2723. 2.78
## 14 2022 3558. 4138. 2.45
## 15 2023 13889. 7145. 1.10
## [1] "Tweet count for each month"
## [1] "Dispersion measures for ZHAW"
## # A tibble: 12 × 4
## year mean_interval sd_interval entropy_interval
## <dbl> <dbl> <dbl> <dbl>
## 1 2012 1178. 1486. 2.09
## 2 2013 1291. 1963. 1.35
## 3 2014 1098. 1884. 1.29
## 4 2015 2032. 2341. 2.00
## 5 2016 1570. 1847. 2.00
## 6 2017 1524. 1580. 2.21
## 7 2018 1490. 1552. 2.42
## 8 2019 1448. 1570. 2.25
## 9 2020 2527. 2917. 2.38
## 10 2021 3160. 2988. 2.43
## 11 2022 5463. 6825. 2.06
## 12 2023 35371. 50020. 0.693
## [1] "Tweet count for each month"
## [1] "Dispersion measures for bfh"
## # A tibble: 12 × 4
## year mean_interval sd_interval entropy_interval
## <dbl> <dbl> <dbl> <dbl>
## 1 2012 1483. 2261. 1.68
## 2 2013 1265. 1888. 1.57
## 3 2014 1055. 1526. 1.78
## 4 2015 2209. 2797. 2.03
## 5 2016 3301. 4155. 2.19
## 6 2017 3149. 3790. 2.35
## 7 2018 2510. 2948. 2.10
## 8 2019 2059. 2686. 1.74
## 9 2020 1968. 2164. 1.97
## 10 2021 1484. 1909. 1.52
## 11 2022 1856. 2106. 2.31
## 12 2023 2800. 3313. 2.10
## [1] "Tweet count for each month"
## [1] "Dispersion measures for hes_so"
## # A tibble: 14 × 4
## year mean_interval sd_interval entropy_interval
## <dbl> <dbl> <dbl> <dbl>
## 1 2010 9353. 10637. 2.54
## 2 2011 4852. 6117. 2.09
## 3 2012 4983. 6238. 2.38
## 4 2013 7294. 17630. 1.22
## 5 2014 6485. 8252. 2.27
## 6 2015 4772. 10362. 1.34
## 7 2016 5423. 9275. 1.78
## 8 2017 8087. 14932. 1.79
## 9 2018 3107. 6086. 1.65
## 10 2019 2622. 7755. 0.699
## 11 2020 1521. 2343. 1.66
## 12 2021 2956. 4038. 1.97
## 13 2022 2427. 2218. 2.74
## 14 2023 1860. 1787. 2.11
## [1] "Tweet count for each month"
## [1] "Dispersion measures for hslu"
## # A tibble: 8 × 4
## year mean_interval sd_interval entropy_interval
## <dbl> <dbl> <dbl> <dbl>
## 1 2016 2192. 3286. 1.79
## 2 2017 1702. 2338. 1.91
## 3 2018 1261. 1765. 1.76
## 4 2019 1960. 2304. 2.20
## 5 2020 1095. 1423. 2.08
## 6 2021 569. 598. 2.41
## 7 2022 844. 905. 2.16
## 8 2023 1909. 1524. 2.58
## [1] "Tweet count for each month"
## [1] "Dispersion measures for ost_fh"
## # A tibble: 3 × 4
## year mean_interval sd_interval entropy_interval
## <dbl> <dbl> <dbl> <dbl>
## 1 2020 1478. 1653. 2.20
## 2 2021 5272. 7736. 2.06
## 3 2022 8470. 15109. 1.77
## [1] "Tweet count for each month"
## [1] "Dispersion measures for supsi_ch"
## # A tibble: 11 × 4
## year mean_interval sd_interval entropy_interval
## <dbl> <dbl> <dbl> <dbl>
## 1 2013 2536. 3892. 1.85
## 2 2014 5283. 12951. 1.15
## 3 2015 8346. 12979. 1.99
## 4 2016 2634. 5705. 1.48
## 5 2017 1396. 3394. 0.772
## 6 2018 1648. 3127. 1.45
## 7 2019 2302. 3156. 1.81
## 8 2020 3093. 3234. 2.34
## 9 2021 1769. 2297. 1.49
## 10 2022 2677. 3112. 1.88
## 11 2023 2372. 2840. 1.96
# Descriptive statistics of time intervals
summary(tweets$time_interval)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0 148.2 1128.8 2097.6 2428.3 220707.0 8
The data indicates that Swiss Universities of Applied Sciences primarily tweet during working hours and show distinct patterns in their most active days and hours. Workers tend to be more productive and active on Twitter in the morning, with a noticeable decline in activity around midday and towards the end of the week. The dispersion measures support this interpretation, showing varying levels of unpredictability across different years and universities.
The tweets are filtered based on language, focusing on German, French, Italian, and English. These languages where choosen based on the popularity over all tweet languages. It removes common and extended stopwords, including non-meaningful words like ‘amp’ (which represents ‘&’) and ‘rt’ (commonly found in retweets). The extended stopwords list includes hashtags and URLs related to specific Swiss universities.
Next, the code processes tweets separately for each language. This involves creating tokens from the text, removing unwanted characters, stemming words, and creating n-grams. The processed tokens are then used to create Document-Feature Matrices (DFMs) for each language.
langs <- c("de", "fr", "it", "en")
tweets_filtered <- tweets %>%
filter(lang %in% langs)
# Define extended stopwords (outside loop for efficiency)
# Remove 'amp' as it is not meaningful because its only & symbol
# Remove 'rt' because it is an word e.g 'engagiert'.
extended_stopwords <- c(
"#fhnw", "#bfh", "@htw_chur", "#hslu", "#supsi", "#sups",
"amp", "rt", "fr", "ber", "t.co", "https", "http", "www", "com", "html"
)
# Create separate DFMs for each language
dfm_list <- list()
for (sel_lang in langs) {
# Subset tweets for the current language
tweets_lang <- tweets_filtered %>%
filter(lang == sel_lang)
# Create tokens for the current language
stopwords_lang <- stopwords(sel_lang)
# Create tokens for all tweets:
# - create corpus and tokens because tokensonly works on character, corpus, list, tokens, tokens_xptr objects.
# - create tokens and remove: URLS, Punctuation, Numbers, Symbols, Separators
# - transform to lowercase
# - Stem all words
# - Create n-grams of any length (not includinf bigrams and trigrams but they are shown later)
# - It is important to remove the stopwords after stemming the words because we remove the endings from some stem words
tokens_lang <- tweets_lang %>%
corpus(text_field = "full_text_emojis") %>%
tokens(
remove_punct = TRUE, remove_symbols = TRUE, remove_numbers = TRUE,
remove_url = TRUE, remove_separators = TRUE
) %>%
tokens_tolower() %>%
tokens_wordstem(lang = sel_lang) %>%
tokens_ngrams(n = 1) %>%
tokens_select(
pattern =
c(stopwords_lang, extended_stopwords), selection = "remove"
)
# Create DFM for the current language
dfm_list[[sel_lang]] <- dfm(tokens_lang)
}
Tweets were analyzed across four languages: German, French, Italian, and English. Each university tends to tweet predominantly in one or more languages, reflecting the linguistic diversity of Switzerland.
It’s important to note that some words like “right” 👉 and “arrow” ➡️ are actually names of parsed emojis and not written words in the tweets.
Word clouds for each language visually show the most common words, emphasizing their relative frequencies. The analysis revealed that universities tweet in multiple languages, reflecting the linguistic diversity of their audience. But still we can order the universities by language. For example BFH and FHNW are tweeting in german, HES-SO in french, SUPSI in italian and HSLU in english.
# Word Frequencies & Visualization
words_freqs_en <- sort(colSums(dfm_list$en), decreasing = TRUE)
head(words_freqs_en, 20)
## student @hslu new @zhaw project univers
## 117 90 86 73 69 69
## day thank switzerland swiss scienc innov
## 68 67 67 65 64 61
## now studi great today join @fhnw
## 54 54 53 52 48 47
## research @supsi_ch
## 47 47
wordcloud(
words = names(words_freqs_en),
freq = words_freqs_en,
max.words = 200,
random.order = FALSE,
colors = brewer.pal(8, "Dark2")
)
words_freqs_de <- sort(colSums(dfm_list$de), decreasing = TRUE)
head(words_freqs_de, 20)
## neu mehr schweiz werd all studier heut hochschul
## 1680 1133 988 781 731 729 696 621
## statt bfh jahr bern digital thema findet knnen
## 607 598 539 533 525 524 520 517
## projekt studi welch arbeit
## 488 484 464 444
wordcloud(
words = names(words_freqs_de),
freq = words_freqs_de,
max.words = 200,
random.order = FALSE,
colors = brewer.pal(8, "Dark2")
)
words_freqs_it <- sort(colSums(dfm_list$it), decreasing = TRUE)
head(words_freqs_it, 20)
## nuov sups progett #supsinews info student
## 214 208 176 151 147 146
## present iscrizion cors #supsievent ricerc formazion
## 145 144 142 139 137 135
## scopr inform diplom bachelor apert tutt
## 129 120 119 113 111 109
## master dipart
## 107 105
wordcloud(
words = names(words_freqs_it),
freq = words_freqs_it,
max.words = 200,
random.order = FALSE,
colors = brewer.pal(8, "Dark2")
)
# It seems that there are some english words but I think this are emojis
words_freqs_fr <- sort(colSums(dfm_list$fr), decreasing = TRUE)
head(words_freqs_fr, 20)
## hes-so right arrow projet dan a tudi haut
## 515 433 324 251 249 234 200 182
## col @hes_so dcouvr @hessoval book #hes_so recherch open
## 155 150 130 129 123 119 117 117
## mast suiss plus nouveau
## 111 110 105 99
wordcloud(
words = names(words_freqs_fr),
freq = words_freqs_fr,
max.words = 200,
random.order = FALSE,
colors = brewer.pal(8, "Dark2")
)
# University-specific Analysis
for (uni in unique(tweets$university)) {
# Subset tweets for the current language
uni_tweets <- tweets %>%
filter(university == uni)
tokens_lang <- uni_tweets %>%
corpus(text_field = "full_text_emojis") %>%
tokens(
remove_punct = TRUE, remove_symbols = TRUE, remove_numbers = TRUE,
remove_url = TRUE, remove_separators = TRUE
) %>%
tokens_tolower() %>%
tokens_wordstem() %>%
tokens_ngrams(n = 1) %>%
tokens_select(
pattern =
c(
stopwords("en"), stopwords("de"),
stopwords("fr"), stopwords("it"), extended_stopwords
), selection = "remove"
)
# Create Data Frame Matrix for uni with all languages
uni_dfm <- dfm(tokens_lang)
# Word Frequencies
uni_word_freqs <- sort(colSums(uni_dfm), decreasing = TRUE)
# print most common words: the emoji right are used often
print(paste("Most common words for", uni, ":"))
print(head(uni_word_freqs, 20))
wordcloud(
words = names(uni_word_freqs),
freq = uni_word_freqs,
max.words = 200,
random.order = FALSE,
colors = brewer.pal(8, "Dark2")
)
}
## [1] "Most common words for FHNW :"
## @fhnwbusi fhnw @hsafhnw mehr hochschul
## 380 320 261 233 214
## @fhnwtechnik @fhnw neue campus heut
## 207 205 160 149 149
## @fhnwpsychologi studierend neuen projekt schweiz
## 138 114 108 101 101
## knnen basel entwickelt brugg-windisch prof
## 97 95 85 83 82
## [1] "Most common words for FH_Graubuenden :"
## chur #htwchur statt htw findet
## 367 360 307 292 276
## mehr blogbeitrag #fhgr #infoanlass graubnden
## 230 203 182 179 177
## infoanlass #chur neuen heut #studium
## 157 128 128 127 123
## manag @suedostschweiz neue fh tourismus
## 111 111 103 98 96
## [1] "Most common words for ZHAW :"
## zhaw @zhaw @engineeringzhaw #zhaw
## 281 257 245 187
## cc neue dank @iam_winterthur
## 185 152 147 146
## heut mehr knnen zeigt
## 141 138 137 133
## winterthur schweizer via @sml_zhaw
## 126 125 124 120
## #zhawimpact schweiz studi gibt
## 120 115 115 105
## [1] "Most common words for bfh :"
## bfh bern berner mehr @bfh_hesb
## 606 282 231 213 208
## neue thema fachhochschul @hkb_bfh knnen
## 205 199 166 117 109
## projekt #knoten_maschen biel heut innen
## 109 107 102 102 102
## anmelden schweizer neuen schweiz statt
## 101 100 96 92 89
## [1] "Most common words for hes_so :"
## hes-so right arrow projet dan tudiant
## 532 445 331 249 248 192
## haut @hes_so cole @hessovalai book master
## 178 172 149 133 124 123
## #hes_so open recherch suiss plus magnifi
## 123 123 116 109 103 97
## glass tilt
## 97 97
## [1] "Most common words for hslu :"
## @hslu luzern mehr hochschul depart
## 368 334 258 198 175
## #hsluinformatik heut neue schweizer zeigt
## 173 171 165 146 133
## design knnen studi schweiz gibt
## 132 127 125 118 114
## jahr ab neuen projekt arbeit
## 107 105 105 103 99
## [1] "Most common words for ost_fh :"
## #ostschweizerfachhochschul @ost_fh
## 73 64
## ost @ozg_ost
## 55 28
## mehr neue
## 26 22
## st.gallen rapperswil
## 17 17
## neuen ostschweiz
## 17 15
## #informatik podcast
## 15 15
## detail gibt
## 15 14
## #ost menschen
## 12 12
## thema campus
## 12 12
## @eastdigit #podcast
## 12 12
## [1] "Most common words for supsi_ch :"
## supsi #supsiev #supsinew info studenti formazion
## 231 183 168 148 133 132
## progetto @supsi_ch iscrizioni master nuovo bachelor
## 126 126 117 117 116 115
## right innov dipartimento pi oggi @usi_univers
## 114 109 104 103 102 102
## informazioni manag
## 97 95
To understand user reactions, the code calculates a ‘weighted engagement’ metric, combining favorite and retweet counts. The tweets with the highest engagement are analyzed by hour and day to identify patterns in user interaction.
The provided bar plots show the average engagement of tweets by hour of the day and by day of the week. Each bar represents the average engagement score for tweets posted during specific hours or on specific days.
# Calculate a 'weighted engagement' metric
tweets <- tweets %>%
mutate(
weighted_engagement = favorite_count * 1 + retweet_count * 2
)
# Identify tweets with the highest weighted engagement
most_engaged_tweets <- tweets %>%
arrange(desc(weighted_engagement)) %>%
head(1000) # Top 1000 for analysis
# Calculate average engagement by hour
engagement_hour <- most_engaged_tweets %>%
group_by(timeofday_hour) %>%
summarise(avg_engagement = mean(weighted_engagement, na.rm = TRUE))
ggplot(engagement_hour, aes(x = timeofday_hour, y = avg_engagement)) +
geom_bar(stat = "identity", fill = "blue") +
labs(
title = "Average Engagement by Hour",
x = "Hour of Day",
y = "Average Engagement"
) +
theme_minimal()
# Calculate average engagement by day
engagement_day <- most_engaged_tweets %>%
group_by(day) %>%
summarise(avg_engagement = mean(weighted_engagement, na.rm = TRUE))
# Plot average engagement by day
ggplot(engagement_day, aes(x = day, y = avg_engagement)) +
geom_bar(stat = "identity", fill = "blue") +
labs(
title = "Average Engagement by Day of the Week",
x = "Day of the Week",
y = "Average Engagement"
) +
theme_minimal()
The most common words in the most liked tweets include “mehr” (more), “neue” (new), “schweiz” (Switzerland), “heut” (today), and “hochschul” (university). These words suggest that tweets focusing on new developments, events happening today, and general updates about Switzerland and universities tend to receive more likes.
# Preprocessing content of most liked tweets
tokens_most_engaged <- most_engaged_tweets %>%
corpus(text_field = "full_text_emojis") %>%
tokens(
remove_punct = TRUE, remove_symbols = TRUE, remove_numbers = TRUE,
remove_url = TRUE, remove_separators = TRUE
) %>%
tokens_tolower() %>%
tokens_wordstem(lang = sel_lang) %>%
tokens_ngrams(n = 1) %>%
tokens_select(
pattern =
c(
stopwords("en"), stopwords("de"),
stopwords("fr"), stopwords("it"), extended_stopwords
), selection = "remove"
)
tokens_most_engaged_dfm <- dfm(tokens_most_engaged)
freqs_most_engaged <- sort(colSums(tokens_most_engaged_dfm), decreasing = TRUE)
# print most common words: the emoji right are used often
head(freqs_most_engaged, 20)
## mehr neue schweiz schweizer right
## 81 67 48 47 46
## heut zeigt #hsluinformatik studi zhaw
## 44 41 40 39 39
## hes-so knnen neuen hochschul campus
## 38 38 36 34 33
## innov gibt ab entwickelt bfh
## 31 30 30 30 30
set.seed(123)
wordcloud(
words = names(freqs_most_engaged),
freq = freqs_most_engaged,
max.words = 200,
random.order = FALSE,
colors = brewer.pal(8, "Dark2")
)
Each university shows distinct patterns in the words and emojis used in their tweets. The analysis involved creating word clouds and identifying the most common words and emojis.
Most Common Words:
Most Common Emojis: - FHNW: Top emojis include 👉 (backhand index pointing right), 💛 (yellow heart), and 🖤 (black heart). - FH Graubünden: Frequent emojis are 🎉 (party popper), 😃 (grinning face with big eyes), and 😊 (blush). - ZHAW: Common emojis include 👉 (backhand index pointing right), ⚡ (high voltage), and 😉 (wink). - BFH: Top emojis are 👉 (backhand index pointing right), 🔋 (battery), and 👇 (backhand index pointing down). - HES-SO: Common emojis are 👉 (backhand index pointing right), 🎓 (graduation cap), and ➡ (arrow right). - HSLU: Top emojis include 🎓 (graduation cap), 👨 (man), and 🚀 (rocket). - OST-FH: Frequent emojis are 👉 (backhand index pointing right), ➡ (arrow right), and 🎓 (graduation cap). - SUPSI-CH: Common emojis include 👉 (backhand index pointing right), 🎓 (graduation cap), and 🎉 (party popper).
for (uni in unique(tweets$university)) {
uni_tweets <- tweets %>%
filter(university == uni, lang %in% langs)
tokens_uni <- uni_tweets %>%
corpus(text_field = "full_text_emojis") %>%
tokens(
remove_punct = TRUE, remove_symbols = TRUE, remove_numbers = TRUE,
remove_url = TRUE, remove_separators = TRUE
) %>%
tokens_tolower() %>%
tokens_wordstem() %>%
tokens_ngrams(n = 1) %>%
tokens_select(
pattern =
c(
stopwords("en"), stopwords("de"),
stopwords("fr"), stopwords("it"), extended_stopwords
), selection = "remove"
)
uni_dfm <- dfm(tokens_uni)
freqs_uni <- sort(colSums(uni_dfm), decreasing = TRUE)
# print most common words: the emoji right are used often
print(paste("Most common words for", uni, ":"))
print(head(freqs_uni, 20))
set.seed(123)
wordcloud(
words = names(freqs_uni),
freq = freqs_uni,
max.words = 200,
random.order = FALSE,
colors = brewer.pal(8, "Dark2")
)
# Analyze Top Emojis by University
emoji_count_per_university <- uni_tweets %>%
top_n_emojis(full_text)
print(paste("Top emojis for", uni, ":"))
print(emoji_count_per_university)
print(emoji_count_per_university %>%
mutate(emoji_name = reorder(emoji_name, n)) %>%
ggplot(aes(n, emoji_name)) +
geom_col() +
labs(x = "Count", y = NULL, title = "Top 20 Emojis Used"))
}
## [1] "Most common words for FHNW :"
## @fhnwbusi fhnw @hsafhnw mehr hochschul
## 377 320 257 233 214
## @fhnwtechnik @fhnw neue campus heut
## 205 203 160 149 149
## @fhnwpsychologi studierend neuen projekt schweiz
## 138 114 108 101 101
## knnen basel entwickelt brugg-windisch prof
## 97 94 85 83 82
## [1] "Top emojis for FHNW :"
## # A tibble: 20 × 4
## emoji_name unicode emoji_category n
## <chr> <chr> <chr> <int>
## 1 backhand_index_pointing_right 👉 People & Body 57
## 2 yellow_heart 💛 Smileys & Emotion 34
## 3 black_heart 🖤 Smileys & Emotion 32
## 4 woman 👩 People & Body 28
## 5 clap 👏 People & Body 17
## 6 flag_Switzerland 🇨🇭 Flags 17
## 7 man 👨 People & Body 17
## 8 microscope 🔬 Objects 15
## 9 computer 💻 Objects 14
## 10 robot 🤖 Smileys & Emotion 14
## 11 graduation_cap 🎓 Objects 13
## 12 school 🏫 Travel & Places 13
## 13 face_with_medical_mask 😷 Smileys & Emotion 12
## 14 raised_hands 🙌 People & Body 12
## 15 female_sign ♀️ Symbols 10
## 16 star_struck 🤩 Smileys & Emotion 10
## 17 trophy 🏆 Activities 10
## 18 party_popper 🎉 Activities 9
## 19 woman_scientist 👩🔬 People & Body 9
## 20 sun_with_face 🌞 Travel & Places 8
## [1] "Most common words for FH_Graubuenden :"
## chur #htwchur statt htw findet
## 365 355 305 290 274
## mehr blogbeitrag #fhgr #infoanlass graubnden
## 230 203 182 177 176
## infoanlass #chur neuen heut #studium
## 157 128 128 127 123
## manag @suedostschweiz neue fh tourismus
## 110 108 103 98 96
## [1] "Top emojis for FH_Graubuenden :"
## # A tibble: 20 × 4
## emoji_name unicode emoji_category n
## <chr> <chr> <chr> <int>
## 1 party_popper 🎉 Activities 19
## 2 grinning_face_with_big_eyes 😃 Smileys & Emotion 15
## 3 blush 😊 Smileys & Emotion 9
## 4 smiling_face_with_sunglasses 😎 Smileys & Emotion 8
## 5 bulb 💡 Objects 7
## 6 flexed_biceps 💪 People & Body 7
## 7 +1 👍 People & Body 6
## 8 camera_flash 📸 Objects 6
## 9 four_leaf_clover 🍀 Animals & Nature 6
## 10 grinning_face_with_smiling_eyes 😄 Smileys & Emotion 6
## 11 heart_eyes 😍 Smileys & Emotion 6
## 12 hugs 🤗 Smileys & Emotion 6
## 13 grinning 😀 Smileys & Emotion 5
## 14 computer 💻 Objects 4
## 15 female_sign ♀️ Symbols 4
## 16 graduation_cap 🎓 Objects 4
## 17 robot 🤖 Smileys & Emotion 4
## 18 backhand_index_pointing_down 👇 People & Body 3
## 19 lady_beetle 🐞 Animals & Nature 3
## 20 ocean 🌊 Travel & Places 3
## [1] "Most common words for ZHAW :"
## zhaw @zhaw @engineeringzhaw #zhaw
## 278 254 244 184
## cc neue dank @iam_winterthur
## 183 151 146 145
## heut mehr knnen zeigt
## 141 138 137 133
## winterthur schweizer via @sml_zhaw
## 126 125 124 120
## #zhawimpact schweiz studi gibt
## 120 115 115 105
## [1] "Top emojis for ZHAW :"
## # A tibble: 20 × 4
## emoji_name unicode emoji_category n
## <chr> <chr> <chr> <int>
## 1 backhand_index_pointing_right 👉 People & Body 21
## 2 high_voltage ⚡ Travel & Places 11
## 3 wink 😉 Smileys & Emotion 9
## 4 clap 👏 People & Body 5
## 5 flag_Switzerland 🇨🇭 Flags 5
## 6 rocket 🚀 Travel & Places 5
## 7 +1 👍 People & Body 4
## 8 arrow_right ➡️ Symbols 4
## 9 bug 🐛 Animals & Nature 3
## 10 computer 💻 Objects 3
## 11 flexed_biceps 💪 People & Body 3
## 12 man 👨 People & Body 3
## 13 bangbang ‼️ Symbols 2
## 14 camera_flash 📸 Objects 2
## 15 dark_skin_tone 🏿 Component 2
## 16 exclamation ❗ Symbols 2
## 17 female_sign ♀️ Symbols 2
## 18 four_leaf_clover 🍀 Animals & Nature 2
## 19 green_salad 🥗 Food & Drink 2
## 20 grinning 😀 Smileys & Emotion 2
## [1] "Most common words for bfh :"
## bfh bern berner mehr @bfh_hesb
## 606 280 231 213 207
## neue thema fachhochschul @hkb_bfh knnen
## 205 199 166 117 109
## projekt #knoten_maschen biel heut innen
## 109 107 102 102 102
## anmelden schweizer neuen schweiz statt
## 101 100 96 92 89
## [1] "Top emojis for bfh :"
## # A tibble: 20 × 4
## emoji_name unicode emoji_category n
## <chr> <chr> <chr> <int>
## 1 backhand_index_pointing_right 👉 People & Body 49
## 2 battery 🔋 Objects 16
## 3 backhand_index_pointing_down 👇 People & Body 12
## 4 woman 👩 People & Body 12
## 5 palm_tree 🌴 Animals & Nature 11
## 6 bulb 💡 Objects 10
## 7 computer 💻 Objects 10
## 8 evergreen_tree 🌲 Animals & Nature 10
## 9 graduation_cap 🎓 Objects 10
## 10 party_popper 🎉 Activities 10
## 11 robot 🤖 Smileys & Emotion 10
## 12 rocket 🚀 Travel & Places 10
## 13 clap 👏 People & Body 9
## 14 coconut 🥥 Food & Drink 9
## 15 date 📅 Objects 9
## 16 deciduous_tree 🌳 Animals & Nature 9
## 17 flag_Switzerland 🇨🇭 Flags 9
## 18 automobile 🚗 Travel & Places 8
## 19 clinking_glasses 🥂 Food & Drink 8
## 20 seedling 🌱 Animals & Nature 8
## [1] "Most common words for hes_so :"
## hes-so right arrow projet dan tudiant
## 529 445 331 249 248 192
## haut @hes_so cole @hessovalai book master
## 177 170 149 133 124 123
## #hes_so open recherch suiss plus magnifi
## 123 123 116 109 103 97
## glass tilt
## 97 97
## [1] "Top emojis for hes_so :"
## # A tibble: 20 × 4
## emoji_name unicode emoji_category n
## <chr> <chr> <chr> <int>
## 1 arrow_right ➡️ Symbols 320
## 2 arrow_heading_down ⤵️ Symbols 246
## 3 book 📖 Objects 115
## 4 mag_right 🔎 Objects 97
## 5 mega 📣 Objects 53
## 6 clapper 🎬 Objects 38
## 7 NEW_button 🆕 Symbols 35
## 8 computer 💻 Objects 35
## 9 microscope 🔬 Objects 32
## 10 bulb 💡 Objects 29
## 11 police_car_light 🚨 Travel & Places 28
## 12 backhand_index_pointing_right 👉 People & Body 27
## 13 graduation_cap 🎓 Objects 23
## 14 studio_microphone 🎙️ Objects 23
## 15 clap 👏 People & Body 21
## 16 date 📅 Objects 17
## 17 medal_sports 🏅 Activities 15
## 18 memo 📝 Objects 15
## 19 woman 👩 People & Body 15
## 20 flag_Switzerland 🇨🇭 Flags 14
## [1] "Most common words for hslu :"
## @hslu luzern mehr hochschul depart
## 363 332 258 198 175
## #hsluinformatik heut neue schweizer zeigt
## 172 171 165 146 133
## design knnen studi schweiz gibt
## 132 127 125 118 114
## jahr ab neuen projekt arbeit
## 106 105 105 103 99
## [1] "Top emojis for hslu :"
## # A tibble: 20 × 4
## emoji_name unicode emoji_category n
## <chr> <chr> <chr> <int>
## 1 sparkles ✨ Activities 29
## 2 flag_Switzerland 🇨🇭 Flags 20
## 3 party_popper 🎉 Activities 12
## 4 rocket 🚀 Travel & Places 12
## 5 partying_face 🥳 Smileys & Emotion 11
## 6 bottle_with_popping_cork 🍾 Food & Drink 9
## 7 Christmas_tree 🎄 Activities 7
## 8 clap 👏 People & Body 7
## 9 star ⭐ Travel & Places 7
## 10 glowing_star 🌟 Travel & Places 6
## 11 +1 👍 People & Body 5
## 12 bulb 💡 Objects 5
## 13 clinking_glasses 🥂 Food & Drink 5
## 14 smiling_face_with_sunglasses 😎 Smileys & Emotion 5
## 15 camera_flash 📸 Objects 4
## 16 four_leaf_clover 🍀 Animals & Nature 4
## 17 musical_notes 🎶 Objects 4
## 18 person_running 🏃 People & Body 4
## 19 raised_hands 🙌 People & Body 4
## 20 robot 🤖 Smileys & Emotion 4
## [1] "Most common words for ost_fh :"
## #ostschweizerfachhochschul @ost_fh
## 72 63
## ost @ozg_ost
## 55 28
## mehr neue
## 26 22
## st.gallen rapperswil
## 17 17
## neuen ostschweiz
## 17 15
## #informatik podcast
## 15 15
## detail gibt
## 15 14
## #ost menschen
## 12 12
## thema campus
## 12 12
## @eastdigit #podcast
## 12 12
## [1] "Top emojis for ost_fh :"
## # A tibble: 20 × 4
## emoji_name unicode emoji_category n
## <chr> <chr> <chr> <int>
## 1 graduation_cap 🎓 Objects 3
## 2 man 👨 People & Body 2
## 3 man_student 👨🎓 People & Body 2
## 4 rocket 🚀 Travel & Places 2
## 5 snowflake ❄️ Travel & Places 2
## 6 backhand_index_pointing_right 👉 People & Body 1
## 7 brain 🧠 People & Body 1
## 8 chocolate_bar 🍫 Food & Drink 1
## 9 clapper 🎬 Objects 1
## 10 eyes 👀 People & Body 1
## 11 fire 🔥 Travel & Places 1
## 12 flexed_biceps 💪 People & Body 1
## 13 grinning 😀 Smileys & Emotion 1
## 14 heart_eyes_cat 😻 Smileys & Emotion 1
## 15 high_voltage ⚡ Travel & Places 1
## 16 mantelpiece_clock 🕰️ Travel & Places 1
## 17 sleeping 😴 Smileys & Emotion 1
## 18 slightly_smiling_face 🙂 Smileys & Emotion 1
## 19 sun ☀️ Travel & Places 1
## 20 woman 👩 People & Body 1
## [1] "Most common words for supsi_ch :"
## supsi #supsiev #supsinew info formazion studenti
## 224 175 167 148 132 132
## progetto @supsi_ch iscrizioni master nuovo bachelor
## 126 123 117 117 116 113
## right innov dipartimento pi oggi @usi_univers
## 113 108 103 103 102 98
## informazioni manag
## 97 94
## [1] "Top emojis for supsi_ch :"
## # A tibble: 20 × 4
## emoji_name unicode emoji_category n
## <chr> <chr> <chr> <int>
## 1 arrow_right ➡️ Symbols 84
## 2 backhand_index_pointing_right 👉 People & Body 24
## 3 arrow_forward ▶️ Symbols 18
## 4 graduation_cap 🎓 Objects 17
## 5 bulb 💡 Objects 10
## 6 flag_Switzerland 🇨🇭 Flags 9
## 7 rocket 🚀 Travel & Places 9
## 8 party_popper 🎉 Activities 8
## 9 clap 👏 People & Body 7
## 10 exclamation ❗ Symbols 5
## 11 SOON_arrow 🔜 Symbols 4
## 12 grinning_face_with_big_eyes 😃 Smileys & Emotion 4
## 13 Christmas_tree 🎄 Activities 3
## 14 camera_flash 📸 Objects 3
## 15 computer 💻 Objects 3
## 16 movie_camera 🎥 Objects 3
## 17 pushpin 📌 Objects 3
## 18 rainbow 🌈 Travel & Places 3
## 19 studio_microphone 🎙️ Objects 3
## 20 woman 👩 People & Body 3
# Generate general tokens for bigram and trigram analysis
tokens <- tweets %>%
corpus(text_field = "full_text_emojis") %>%
tokens(
remove_punct = TRUE, remove_symbols = TRUE, remove_numbers = TRUE,
remove_url = TRUE, remove_separators = TRUE
) %>%
tokens_tolower() %>%
tokens_wordstem() %>%
tokens_select(
pattern =
c(
stopwords("en"), stopwords("de"),
stopwords("fr"), stopwords("it"), extended_stopwords
), selection = "remove"
)
# Bigram Wordcloud
bi_gram_tokens <- tokens_ngrams(tokens, n = 2)
dfm_bi_gram <- dfm(bi_gram_tokens)
freqs_bi_gram <- sort(colSums(dfm_bi_gram), decreasing = TRUE)
head(freqs_bi_gram, 20)
## right_arrow htw_chur index_point
## 421 259 207
## backhand_index hochschul_luzern point_right
## 206 185 183
## berner_fachhochschul sozial_arbeit prof_dr
## 157 154 142
## haut_cole herzlich_gratul open_book
## 141 139 117
## magnifi_glass glass_tilt tilt_right
## 97 97 97
## fh_graubnden neusten_blogbeitrag book_#revuehmisphr
## 91 87 85
## social_media advanc_studi
## 84 83
# Create the bigram word cloud
set.seed(123)
wordcloud(
words = names(freqs_bi_gram),
freq = freqs_bi_gram,
max.words = 200,
random.order = FALSE,
colors = brewer.pal(8, "Dark2")
)
# Trigram Wordcloud
tri_gram_tokens <- tokens_ngrams(tokens, n = 3)
dfm_tri_gram <- dfm(tri_gram_tokens)
reqs_tri_gram <- sort(colSums(dfm_tri_gram), decreasing = TRUE)
head(reqs_tri_gram, 20)
## backhand_index_point index_point_right
## 206 183
## magnifi_glass_tilt glass_tilt_right
## 97 97
## open_book_#revuehmisphr hochschul_gestaltung_kunst
## 85 62
## dipartimento_tecnologi_innov master_advanc_studi
## 40 38
## depart_sozial_arbeit #infoanlass_mrz_findet
## 36 33
## polic_car_light univers_appli_scienc
## 32 31
## busi_administr_statt findet_#zrich_infoanlass
## 30 30
## tag_offenen_tr hochschul_life_scienc
## 29 29
## gestaltung_kunst_fhnw mas_busi_administr
## 29 28
## mehr_neuen_blogbeitrag mehr_neusten_blogbeitrag
## 28 28
# Create the bigram word cloud
set.seed(123)
wordcloud(
words = names(reqs_tri_gram),
freq = reqs_tri_gram,
max.words = 200,
random.order = FALSE,
colors = brewer.pal(8, "Dark2")
)
The Latent Dirichlet Allocation (LDA) model was applied to the entire dataset of tweets to identify common topics. Here, the model with 5 topics was selected, and the top terms for each topic were extracted.
It seems that these topics are not that far apart, which could indicate that the tweets are quite similar in content and style. The LDA model still shows us that topics like academic and research activities, student activities, and new developments are relevant in the tweets.
# Source: Christoph Zangger -> löscht alle Reihen mit nur 0s
new_dfm <- dfm_subset(dfm_list$en, ntoken(dfm_list$en) > 0)
tweet_lda <- LDA(new_dfm, k = 5, control = list(seed = 123))
# Tidy the LDA results
topic_terms <- tidy(tweet_lda, matrix = "beta")
# Extract topics and top terms
topics <- as.data.frame(terms(tweet_lda, 50)) # First fifty words per topic
# Extract top terms per topic
top_terms <- topic_terms %>%
group_by(topic) %>%
top_n(8, beta) %>%
ungroup() %>%
arrange(topic, -beta)
print(as.data.frame(terms(tweet_lda, 20)))
## Topic 1 Topic 2 Topic 3 Topic 4 Topic 5
## 1 @hslu scienc @hslu univers new
## 2 innov @zhaw student day student
## 3 studi today innov student @supsi_ch
## 4 student thank new appli swiss
## 5 start switzerland research @zhaw @zhaw
## 6 thank project switzerland open univers
## 7 project @bfh_hesb us digit present
## 8 now welcom now great manag
## 9 can now @fhnw swiss confer
## 10 scienc @fhnwbusi day welcom switzerland
## 11 right #switzerland project team studi
## 12 swiss confer der @bfh_hesb great
## 13 #innov learn event manag @fhnw
## 14 @supsi_ch congrat challeng call today
## 15 video student join next project
## 16 educ open swiss design take
## 17 develop join @academies_ch read happi
## 18 hand last #digitale21 scienc join
## 19 intern research visit thank exchang
## 20 @fhnwbusi @hslu develop learn forward
# Visualize top terms per topic
top_terms %>%
mutate(term = reorder_within(term, beta, topic)) %>%
ggplot(aes(beta, term, fill = factor(topic))) +
geom_col(show.legend = FALSE) +
facet_wrap(~topic, scales = "free") +
scale_y_reordered() +
labs(
x = "Beta (Term Importance within Topic)",
y = "Terms",
title = "Top Terms per Topic in Tweets (LDA)"
) +
theme_minimal()
# Most different words among topics (using log ratios)
diff <- topic_terms %>%
mutate(topic = paste0("topic", topic)) %>%
spread(topic, beta) %>%
filter(topic1 > .001 | topic2 > .001 | topic3 > .001) %>%
mutate(
logratio_t1t2 = log2(topic2 / topic1),
logratio_t1t3 = log2(topic3 / topic1),
logratio_t2t3 = log2(topic3 / topic2)
)
diff
## # A tibble: 313 × 9
## term topic1 topic2 topic3 topic4 topic5 logratio_t1t2 logratio_t1t3
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 @academi… 2.26e-3 6.89e-4 4.17e-3 8.28e-5 2.11e-3 -1.71 0.881
## 2 @bfh_hesb 1.03e-3 5.87e-3 1.19e-3 4.82e-3 2.62e-3 2.52 0.218
## 3 @empa_ch 1.07e-3 4.16e-6 1.58e-4 5.76e-5 4.35e-4 -8.01 -2.76
## 4 @enginee… 6.40e-5 1.38e-4 1.73e-3 1.19e-4 7.10e-4 1.11 4.75
## 5 @esnchur 5.17e-4 4.43e-4 1.03e-3 4.06e-4 1.90e-5 -0.222 0.992
## 6 @fh_grau… 6.45e-4 1.45e-3 6.90e-4 9.07e-4 7.90e-4 1.17 0.0983
## 7 @fhnw 3.90e-4 3.08e-3 5.15e-3 2.83e-3 4.77e-3 2.98 3.72
## 8 @fhnwbusi 3.75e-3 5.59e-3 1.95e-3 4.77e-4 1.00e-3 0.577 -0.945
## 9 @greater… 5.47e-4 1.24e-3 1.09e-3 1.22e-3 7.35e-4 1.18 0.993
## 10 @grstift… 2.04e-4 3.60e-3 1.59e-3 1.25e-3 6.04e-4 4.14 2.96
## # ℹ 303 more rows
## # ℹ 1 more variable: logratio_t2t3 <dbl>
# LDA Topic Modeling for each university
universities <- unique(tweets$university)
for (uni in universities) {
# Filter tweets for the current university
uni_tweets <- tweets %>% filter(university == uni & lang %in% langs)
tokens_uni <- uni_tweets %>%
corpus(text_field = "full_text_emojis") %>%
tokens(
remove_punct = TRUE, remove_symbols = TRUE, remove_numbers = TRUE,
remove_url = TRUE, remove_separators = TRUE
) %>%
tokens_tolower() %>%
tokens_wordstem() %>%
tokens_ngrams(n = 1) %>%
tokens_select(
pattern =
c(
stopwords("en"), stopwords("de"),
stopwords("fr"), stopwords("it"), extended_stopwords
), selection = "remove"
)
uni_dfm <- dfm(tokens_uni)
# Apply LDA
uni_dfm <- dfm_subset(uni_dfm, ntoken(uni_dfm) > 0)
tweet_lda <- LDA(uni_dfm, k = 5, control = list(seed = 123))
# Tidy the LDA results
tweet_lda_td <- tidy(tweet_lda)
# Extract top terms per topic
top_terms <- tweet_lda_td %>%
group_by(topic) %>%
top_n(8, beta) %>%
ungroup() %>%
arrange(topic, -beta)
# Visualize top terms per topic
p <- top_terms %>%
mutate(term = reorder_within(term, beta, topic)) %>%
ggplot(aes(beta, term, fill = factor(topic))) +
geom_col(show.legend = FALSE) +
facet_wrap(~topic, scales = "free") +
scale_y_reordered() +
labs(
x = "Beta (Term Importance within Topic)",
y = NULL,
title = paste("Top Terms per Topic in Tweets from", uni, "(LDA)")
)
print(p)
# Topic Model Summary: top 10 terms per topic
cat("\nTopic Model Summary for", uni, ":\n")
print(as.data.frame(terms(tweet_lda, 10)))
}
##
## Topic Model Summary for FHNW :
## Topic 1 Topic 2 Topic 3 Topic 4 Topic 5
## 1 @hsafhnw fhnw @fhnwbusi @fhnwbusi @fhnwbusi
## 2 neue hochschul @hsafhnw fhnw mehr
## 3 heut index @fhnw campus @fhnw
## 4 fhnw backhand hochschul entwickelt @fhnwtechnik
## 5 @fhnwtechnik @fhnwpsychologi @fhnwtechnik brugg-windisch neuen
## 6 olten point heut mehr neue
## 7 mehr projekt mehr @fhnwpsychologi ab
## 8 hochschul heut studierend @fhnwtechnik studierend
## 9 swiss right fhnw basel geht
## 10 basel neuen gibt schweizer erklrt
##
## Topic Model Summary for FH_Graubuenden :
## Topic 1 Topic 2 Topic 3 Topic 4 Topic 5
## 1 chur @htwchurtour chur statt mehr
## 2 htw student studi findet blogbeitrag
## 3 #chur #htwchur via #infoanlass neuen
## 4 #htwchur @suedostschweiz htw infoanlass #fhgr
## 5 #studium tourism #htwchur #htwchur graubnden
## 6 #fhgr @fh_graubuenden #smartcultur chur neusten
## 7 graubnden studierend heut manag fh
## 8 #schweiz dank @suedostschweiz mas prof
## 9 #graubnden @infowisschur media busi dr
## 10 studium institut @clickandtri mrz wurd
##
## Topic Model Summary for ZHAW :
## Topic 1 Topic 2 Topic 3 Topic 4
## 1 zhaw knnen zhaw @zhaw
## 2 @engineeringzhaw @engineeringzhaw @engineeringzhaw cc
## 3 @zhaw #zhaw neue #zhawimpact
## 4 winterthur zeigt geht zhaw
## 5 knnen studi mehr @iam_winterthur
## 6 @sml_zhaw neue cc heut
## 7 via dank zeigt #zhaw
## 8 zeigt schweizer #zhaw via
## 9 heut heut gibt zukunft
## 10 ab neuen immer neuen
## Topic 5
## 1 @zhaw
## 2 #zhaw
## 3 @iam_winterthur
## 4 studi
## 5 zhaw
## 6 dank
## 7 mehr
## 8 @sml_zhaw
## 9 #toniar
## 10 #tonitag
##
## Topic Model Summary for bfh :
## Topic 1 Topic 2 Topic 3 Topic 4 Topic 5
## 1 bfh bern bfh bfh neue
## 2 fachhochschul @bfh_hesb @bfh_hesb thema thema
## 3 berner berner schweiz mehr #knoten_maschen
## 4 bern mehr mehr knnen berner
## 5 mehr bfh neuen @hkb_bfh bern
## 6 @hkb_bfh unternehmen neue bern schweizer
## 7 entwickelt neue innen statt welch
## 8 anmelden projekt bern biel gibt
## 9 arbeit ab gesundheit fachhochschul @bfh_sosec
## 10 sozial thema schweizer erfahren day
##
## Topic Model Summary for hes_so :
## Topic 1 Topic 2 Topic 3 Topic 4 Topic 5
## 1 projet right hes-so right hes-so
## 2 arrow projet tudiant arrow haut
## 3 dan cole dan open right
## 4 right tilt @hes_so book tudiant
## 5 #hes_so dan arrow master arrow
## 6 @hessovalai glass projet #revuehmisphr projet
## 7 dcouvrez arrow recherch suiss dan
## 8 @hes_so journ nouveau recherch @hessovalai
## 9 #revuehmisphr format suiss cole @radiotelesuiss
## 10 particip @hes_so haut scienc #hes_so
##
## Topic Model Summary for hslu :
## Topic 1 Topic 2 Topic 3 Topic 4 Topic 5
## 1 @hslu luzern @hslu zeigt #hsluinformatik
## 2 luzern @hslu luzern heut mehr
## 3 mehr depart hochschul digit schweizer
## 4 #hsluinformatik hochschul knnen mehr @hslu
## 5 neue heut mehr hochschul projekt
## 6 depart neue campus welch #hsluwirtschaft
## 7 thema mehr jahr projekt schweiz
## 8 design geht #hsludk neue luzern
## 9 schweizer zeigt findet schweiz studierend
## 10 statt knnen statt studi studi
##
## Topic Model Summary for ost_fh :
## Topic 1 Topic 2
## 1 #ostschweizerfachhochschul #ostschweizerfachhochschul
## 2 @ost_fh @ost_fh
## 3 bachelor ost
## 4 neue @ozg_ost
## 5 #diplomfei kontrast
## 6 podcast kulturzyklus
## 7 ab #ausdreiwirdein
## 8 #bachelor fhs
## 9 online-infoabend #ost
## 10 statt mehr
## Topic 3 Topic 4
## 1 @ost_fh ost
## 2 ost @ost_fh
## 3 #ostschweizerfachhochschul @ozg_ost
## 4 rapperswil-jona #ostschweizerfachhochschul
## 5 campus rapperswil
## 6 detail neue
## 7 neu neuen
## 8 #wirtschaftsinformatik ostschweiz
## 9 gibt st.gallen
## 10 #informatik mehr
## Topic 5
## 1 #ostschweizerfachhochschul
## 2 menschen
## 3 mehr
## 4 @ost_fh
## 5 ost
## 6 spricht
## 7 prof
## 8 institut
## 9 team
## 10 @ozg_ost
##
## Topic Model Summary for supsi_ch :
## Topic 1 Topic 2 Topic 3 Topic 4 Topic 5
## 1 progetto formazion iscrizioni #supsinew supsi
## 2 #supsiev studenti master supsi info
## 3 @usi_univers oggi supsi #supsiev bachelor
## 4 #supsinew informazioni innov studenti corsi
## 5 competenz supsi apert right pi
## 6 arrow ottobr info formazion #supsiev
## 7 svizzera @supsi_ch bachelor @supsi_ch nuovo
## 8 @supsi_ch dipartimento design info progetto
## 9 informazioni maggiori advanc master studi
## 10 supsi iscrizioni @usi_univers ricerca campus
The LDA analysis reveals distinct topics across all tweets, emphasizing academic activities, events, and institutional developments. Overall the topics over all universities seem to be quite similar. Which indicates that the tweets are similar in content and style. However, the LDA model shows that topics like academic and research activities, student activities, and new developments are relevant in the tweets. Also because we stemmed the words could be not that understandable for the algorithm, this may have an effect on the choosen topics.
The distribution of tweet lengths shows variation across universities. Most tweets are concise, aligning with Twitter’s character limit, but the exact length distribution differs among institutions. It is interesting to see that much tweets have around 150 words and that the tweets from the universities are not that long. It is a typical sign that the tweets are not that long and this is a common thing in social media.
tweets %>%
mutate(tweet_length = nchar(full_text)) %>%
ggplot(aes(x = tweet_length)) +
geom_histogram() +
labs(title = "Distribution of Tweet Lengths")
Sentiment analysis was conducted to evaluate the emotional tone of the tweets. The analysis used the Syuzhet method to calculate sentiment scores for each tweet. Syuzhet was chosen because of its ability to capture the emotions within text. Which makes it effective for analyzing narrative structures. Additionally, the NRC sentiment dictionary was utilized for its comprehensive coverage of emotions across multiple languages, including German, Italian, and French. See Documentation: https://cran.r-project.org/web/packages/syuzhet/vignettes/syuzhet-vignette.html#:~:text=Multilingual%20Sentiment%20Lexicons. Together, Syuzhet and NRC provide a powerful combination for performing detailed and multilingual sentiment analysis.
Overall Sentiment Trends: - The sentiment scores vary over time and by university, showing fluctuations in the emotional tone of the tweets. - Positive words commonly found in tweets include terms related to academic achievements, collaborations, and positive experiences. - Negative words often relate to challenges, competitions, and issues faced by the universities.
# Calculate Sentiment for Supported Languages Only
langs <- c("de", "fr", "it", "en")
tweets_filtered <- tweets %>%
filter(lang %in% langs)
# Function to get sentiment based on language
get_multilang_sentiment <- function(text, lang) {
if (lang == "de") {
return(get_sentiment(text, method = "nrc", language = "german"))
} else if (lang == "it") {
return(get_sentiment(text, method = "nrc", language = "italian"))
} else if (lang == "fr") {
return(get_sentiment(text, method = "nrc", language = "french"))
} else if (lang == "en") {
return(get_sentiment(text, method = "syuzhet"))
} else {
return(NA) # Return NA for unsupported languages
}
}
# Calculate Syuzhet Sentiment for each Tweet
tweets_filtered$sentiment <-
mapply(
get_multilang_sentiment,
tweets_filtered$full_text, tweets_filtered$lang
)
plot_data <- tweets_filtered %>%
group_by(university, month) %>%
summarize(mean_sentiment = mean(sentiment, na.rm = TRUE))
# Convert month to Date format
plot_data <- plot_data %>%
mutate(month = as.Date(month, format = "%Y-%m-%d"))
# Plot Sentiment by all Universities
ggplot(plot_data, aes(
x = month,
y = mean_sentiment,
color = university, group = university
)) +
geom_line() +
geom_smooth(method = "lm", se = FALSE, color = "red") +
labs(
title = "Mean Sentiment Over Time by University",
y = "Mean Sentiment Score",
x = "Month"
) +
scale_x_date(date_breaks = "1 month", date_labels = "%Y-%m") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
theme_minimal()
for (uni in unique(tweets_filtered$university)) {
most_used_lang <- tweets %>%
filter(university == uni) %>%
count(lang) %>%
slice_max(n = 1, order_by = n) %>%
pull(lang)
uni_tweets <- tweets_filtered %>%
filter(university == uni & lang == most_used_lang)
plot_data <- uni_tweets %>%
group_by(month) %>%
summarize(mean_sentiment = mean(sentiment, na.rm = TRUE))
# Plot Syuzhet Sentiment Over Time (Per University)
print(ggplot(plot_data, aes(x = month, y = mean_sentiment, group = 1)) +
geom_line() +
geom_smooth(method = "lm", se = FALSE, color = "red") +
labs(
title = paste0("Mean Syuzhet Sentiment Over Time by - ", uni),
y = "Mean Sentiment Score",
x = "Month"
))
# Tokenize and Preprocess Words
uni_words <- uni_tweets %>%
unnest_tokens(word, full_text_emojis) %>%
filter(nchar(word) > 3) %>%
filter(!str_detect(word, "\\d")) %>%
filter(!str_detect(word, "https?://\\S+|www\\.\\S+|t\\.co|http|https"))
# Remove stopwords after counting word frequency
word_counts <- uni_words %>%
count(word, sort = TRUE) %>%
anti_join(get_stopwords(language = most_used_lang), by = "word")
sentiment_words <- word_counts %>%
mutate(sentiment = get_multilang_sentiment(word, most_used_lang))
# Separate Positive and Negative Words
positive_words <- sentiment_words %>%
filter(sentiment >= 0) %>%
arrange(desc(n)) %>%
rename(freq = n)
negative_words <- sentiment_words %>%
filter(sentiment < 0) %>%
arrange(desc(n)) %>%
rename(freq = n)
# Create and Display Word Clouds
print(paste0("Positive words for: ", uni, " in ", most_used_lang))
print(head(positive_words, 20))
wordcloud(
words = positive_words$word,
freq = positive_words$freq,
max.words = 200,
random.order = FALSE,
colors = brewer.pal(8, "Dark2")
)
print(paste0("Negative words for: ", uni, " in ", most_used_lang))
print(head(negative_words, 20))
wordcloud(
words = negative_words$word,
freq = negative_words$freq,
max.words = 200,
random.order = FALSE,
colors = brewer.pal(8, "Dark2")
)
}
## [1] "Positive words for: FHNW in de"
## # A tibble: 20 × 4
## # Groups: university [1]
## university word freq sentiment
## <chr> <chr> <int> <dbl>
## 1 FHNW fhnw 1598 0
## 2 FHNW fhnwbusiness 342 0
## 3 FHNW hsafhnw 253 0
## 4 FHNW mehr 232 0
## 5 FHNW hochschule 219 0
## 6 FHNW fhnwtechnik 204 0
## 7 FHNW campus 170 0
## 8 FHNW heute 147 0
## 9 FHNW fhnwpsychologie 138 0
## 10 FHNW basel 134 0
## 11 FHNW neue 134 0
## 12 FHNW studierende 123 0
## 13 FHNW brugg 111 0
## 14 FHNW neuen 108 0
## 15 FHNW schweiz 107 0
## 16 FHNW windisch 101 0
## 17 FHNW olten 98 0
## 18 FHNW knnen 97 0
## 19 FHNW projekt 86 0
## 20 FHNW prof 83 0
## [1] "Negative words for: FHNW in de"
## # A tibble: 20 × 4
## # Groups: university [1]
## university word freq sentiment
## <chr> <chr> <int> <dbl>
## 1 FHNW wettbewerb 13 -1
## 2 FHNW problem 12 -1
## 3 FHNW trotz 12 -2
## 4 FHNW bakterien 10 -1
## 5 FHNW herausforderung 10 -1
## 6 FHNW junge 10 -1
## 7 FHNW krise 10 -1
## 8 FHNW sucht 10 -1
## 9 FHNW spiel 8 -1
## 10 FHNW spielen 6 -1
## 11 FHNW kaum 5 -1
## 12 FHNW lust 5 -1
## 13 FHNW nachfrage 5 -1
## 14 FHNW reihe 5 -1
## 15 FHNW schwer 5 -3
## 16 FHNW tragen 5 -1
## 17 FHNW warten 5 -1
## 18 FHNW angriff 4 -2
## 19 FHNW bisher 4 -1
## 20 FHNW eingeweiht 4 -1
## [1] "Positive words for: FH_Graubuenden in de"
## # A tibble: 20 × 4
## # Groups: university [1]
## university word freq sentiment
## <chr> <chr> <int> <dbl>
## 1 FH_Graubuenden htw_chur 553 0
## 2 FH_Graubuenden chur 469 0
## 3 FH_Graubuenden htwchur 345 0
## 4 FH_Graubuenden infoanlass 327 0
## 5 FH_Graubuenden statt 299 0
## 6 FH_Graubuenden findet 271 0
## 7 FH_Graubuenden graubnden 258 0
## 8 FH_Graubuenden studium 255 0
## 9 FH_Graubuenden mehr 230 0
## 10 FH_Graubuenden blogbeitrag 213 0
## 11 FH_Graubuenden tourismus 177 0
## 12 FH_Graubuenden fhgr 172 0
## 13 FH_Graubuenden bachelor 162 0
## 14 FH_Graubuenden management 136 0
## 15 FH_Graubuenden neuen 128 0
## 16 FH_Graubuenden heute 125 0
## 17 FH_Graubuenden schweiz 123 0
## 18 FH_Graubuenden multimedia 107 0
## 19 FH_Graubuenden photonics 104 0
## 20 FH_Graubuenden suedostschweiz 104 0
## [1] "Negative words for: FH_Graubuenden in de"
## # A tibble: 20 × 4
## # Groups: university [1]
## university word freq sentiment
## <chr> <chr> <int> <dbl>
## 1 FH_Graubuenden sucht 15 -1
## 2 FH_Graubuenden wettbewerb 13 -1
## 3 FH_Graubuenden korruption 12 -1
## 4 FH_Graubuenden junge 10 -1
## 5 FH_Graubuenden hoch 8 -1
## 6 FH_Graubuenden herausforderung 7 -1
## 7 FH_Graubuenden rutsch 7 -1
## 8 FH_Graubuenden kick 6 -1
## 9 FH_Graubuenden vergessen 6 -3
## 10 FH_Graubuenden kochen 5 -1
## 11 FH_Graubuenden problem 5 -1
## 12 FH_Graubuenden lust 4 -1
## 13 FH_Graubuenden spielen 4 -1
## 14 FH_Graubuenden fall 3 -1
## 15 FH_Graubuenden falls 3 -1
## 16 FH_Graubuenden gefahr 3 -2
## 17 FH_Graubuenden kampf 3 -5
## 18 FH_Graubuenden krebs 3 -2
## 19 FH_Graubuenden krise 3 -1
## 20 FH_Graubuenden nachfrage 3 -1
## [1] "Positive words for: ZHAW in de"
## # A tibble: 20 × 4
## # Groups: university [1]
## university word freq sentiment
## <chr> <chr> <int> <dbl>
## 1 ZHAW zhaw 901 0
## 2 ZHAW engineeringzhaw 236 0
## 3 ZHAW iam_winterthur 141 0
## 4 ZHAW studie 139 1
## 5 ZHAW mehr 138 0
## 6 ZHAW schweiz 138 0
## 7 ZHAW knnen 137 0
## 8 ZHAW heute 134 0
## 9 ZHAW zeigt 131 0
## 10 ZHAW schweizer 126 0
## 11 ZHAW unsere 126 0
## 12 ZHAW sml_zhaw 123 0
## 13 ZHAW zhawimpact 120 0
## 14 ZHAW neue 117 0
## 15 ZHAW winterthur 117 0
## 16 ZHAW danke 102 0
## 17 ZHAW studierende 84 0
## 18 ZHAW gibt 83 0
## 19 ZHAW neuen 82 0
## 20 ZHAW dass 81 0
## [1] "Negative words for: ZHAW in de"
## # A tibble: 20 × 4
## # Groups: university [1]
## university word freq sentiment
## <chr> <chr> <int> <dbl>
## 1 ZHAW trotz 19 -2
## 2 ZHAW fall 15 -1
## 3 ZHAW junge 14 -1
## 4 ZHAW sucht 11 -1
## 5 ZHAW falls 10 -1
## 6 ZHAW krise 9 -1
## 7 ZHAW problem 9 -1
## 8 ZHAW spielen 8 -1
## 9 ZHAW behinderung 7 -5
## 10 ZHAW herausforderung 7 -1
## 11 ZHAW bisher 6 -1
## 12 ZHAW bund 6 -1
## 13 ZHAW druck 6 -1
## 14 ZHAW fehlt 5 -2
## 15 ZHAW hoch 5 -1
## 16 ZHAW laut 5 -1
## 17 ZHAW rauch 5 -1
## 18 ZHAW schwer 5 -3
## 19 ZHAW stress 5 -1
## 20 ZHAW tragen 5 -1
## [1] "Positive words for: bfh in de"
## # A tibble: 20 × 4
## # Groups: university [1]
## university word freq sentiment
## <chr> <chr> <int> <dbl>
## 1 bfh bern 290 0
## 2 bfh berner 226 0
## 3 bfh mehr 213 0
## 4 bfh thema 196 0
## 5 bfh neue 184 0
## 6 bfh bfh_hesb 169 0
## 7 bfh fachhochschule 164 0
## 8 bfh biel 112 0
## 9 bfh hkb_bfh 112 0
## 10 bfh knnen 109 0
## 11 bfh holz 106 0
## 12 bfh knoten_maschen 106 0
## 13 bfh hafl 104 0
## 14 bfh heute 101 0
## 15 bfh anmelden 100 0
## 16 bfh innen 100 0
## 17 bfh schweiz 100 0
## 18 bfh schweizer 100 0
## 19 bfh neuen 96 0
## 20 bfh online 93 0
## [1] "Negative words for: bfh in de"
## # A tibble: 20 × 4
## # Groups: university [1]
## university word freq sentiment
## <chr> <chr> <int> <dbl>
## 1 bfh krise 22 -1
## 2 bfh wettbewerb 19 -1
## 3 bfh armut 15 -1
## 4 bfh junge 11 -1
## 5 bfh herausforderung 9 -1
## 6 bfh batterie 8 -1
## 7 bfh boden 8 -2
## 8 bfh schwer 8 -3
## 9 bfh stress 8 -1
## 10 bfh sucht 8 -1
## 11 bfh bisher 7 -1
## 12 bfh kosten 7 -1
## 13 bfh liegen 7 -1
## 14 bfh prozess 7 -1
## 15 bfh sterben 7 -1
## 16 bfh trotz 7 -2
## 17 bfh fehlt 6 -2
## 18 bfh hoch 6 -1
## 19 bfh kaum 6 -1
## 20 bfh spielen 6 -1
## [1] "Positive words for: hes_so in fr"
## # A tibble: 20 × 4
## # Groups: university [1]
## university word freq sentiment
## <chr> <chr> <int> <dbl>
## 1 hes_so right 433 0
## 2 hes_so arrow 324 0
## 3 hes_so hes_so 267 0
## 4 hes_so recherche 184 0
## 5 hes_so projet 148 0
## 6 hes_so master 144 0
## 7 hes_so hessovalais 133 0
## 8 hes_so suisse 126 0
## 9 hes_so open 117 0
## 10 hes_so book 116 0
## 11 hes_so haute 108 0
## 12 hes_so projets 104 0
## 13 hes_so sant 104 0
## 14 hes_so tudiantes 104 0
## 15 hes_so plus 103 0
## 16 hes_so revuehmisphres 97 0
## 17 hes_so glass 96 0
## 18 hes_so magnifying 96 0
## 19 hes_so tilted 96 0
## 20 hes_so genve 86 0
## [1] "Negative words for: hes_so in fr"
## # A tibble: 20 × 4
## # Groups: university [1]
## university word freq sentiment
## <chr> <chr> <int> <dbl>
## 1 hes_so programme 58 -1
## 2 hes_so lance 47 -1
## 3 hes_so entre 46 -1
## 4 hes_so appel 45 -1
## 5 hes_so salon 41 -1
## 6 hes_so contre 19 -1
## 7 hes_so crise 19 -1
## 8 hes_so tous 19 -1
## 9 hes_so livre 13 -1
## 10 hes_so rencontre 12 -1
## 11 hes_so vice 11 -1
## 12 hes_so demande 9 -1
## 13 hes_so vide 8 -1
## 14 hes_so destin 7 -1
## 15 hes_so plein 7 -1
## 16 hes_so sujet 7 -1
## 17 hes_so disposition 6 -1
## 18 hes_so douleur 6 -3
## 19 hes_so faon 6 -1
## 20 hes_so campagne 5 -1
## [1] "Positive words for: hslu in de"
## # A tibble: 20 × 4
## # Groups: university [1]
## university word freq sentiment
## <chr> <chr> <int> <dbl>
## 1 hslu hslu 1645 0
## 2 hslu luzern 288 0
## 3 hslu mehr 257 0
## 4 hslu unsere 201 0
## 5 hslu hochschule 197 0
## 6 hslu informatik 183 0
## 7 hslu hsluinformatik 173 0
## 8 hslu heute 171 0
## 9 hslu schweizer 145 0
## 10 hslu studie 141 1
## 11 hslu design 137 0
## 12 hslu neue 136 0
## 13 hslu zeigt 132 0
## 14 hslu knnen 124 0
## 15 hslu arbeit 121 2
## 16 hslu schweiz 121 0
## 17 hslu bachelor 119 0
## 18 hslu departement 116 0
## 19 hslu studierende 113 0
## 20 hslu kunst 108 2
## [1] "Negative words for: hslu in de"
## # A tibble: 20 × 4
## # Groups: university [1]
## university word freq sentiment
## <chr> <chr> <int> <dbl>
## 1 hslu junge 19 -1
## 2 hslu quiz 19 -1
## 3 hslu trotz 17 -2
## 4 hslu wettbewerb 17 -1
## 5 hslu krise 16 -1
## 6 hslu spiel 13 -1
## 7 hslu spielen 12 -1
## 8 hslu kaum 11 -1
## 9 hslu tragen 9 -1
## 10 hslu gewalt 8 -1
## 11 hslu problem 8 -1
## 12 hslu sucht 8 -1
## 13 hslu herausforderung 7 -1
## 14 hslu hype 7 -1
## 15 hslu kraft 7 -1
## 16 hslu bisher 6 -1
## 17 hslu bund 6 -1
## 18 hslu druck 6 -1
## 19 hslu hoch 6 -1
## 20 hslu kosten 6 -1
## [1] "Positive words for: ost_fh in de"
## # A tibble: 20 × 4
## # Groups: university [1]
## university word freq sentiment
## <chr> <chr> <int> <dbl>
## 1 ost_fh ostschweizerfachhochschule 72 0
## 2 ost_fh ost_fh 57 0
## 3 ost_fh podcast 32 0
## 4 ost_fh ozg_ost 28 0
## 5 ost_fh mehr 26 0
## 6 ost_fh kulturzyklus 24 0
## 7 ost_fh bachelor 22 0
## 8 ost_fh rapperswil 22 0
## 9 ost_fh informatik 21 0
## 10 ost_fh online 21 0
## 11 ost_fh neue 18 0
## 12 ost_fh neuen 17 0
## 13 ost_fh st.gallen 17 0
## 14 ost_fh ostschweizer 16 0
## 15 ost_fh details 15 0
## 16 ost_fh kontrast 15 0
## 17 ost_fh fachhochschule 14 0
## 18 ost_fh campus 13 0
## 19 ost_fh unsere 13 0
## 20 ost_fh eastdigital 12 0
## [1] "Negative words for: ost_fh in de"
## # A tibble: 20 × 4
## # Groups: university [1]
## university word freq sentiment
## <chr> <chr> <int> <dbl>
## 1 ost_fh reihe 4 -1
## 2 ost_fh abfall 2 -2
## 3 ost_fh behinderung 2 -5
## 4 ost_fh fall 2 -1
## 5 ost_fh junge 2 -1
## 6 ost_fh knapp 2 -1
## 7 ost_fh schlagen 2 -2
## 8 ost_fh trotz 2 -2
## 9 ost_fh aufsehen 1 -1
## 10 ost_fh ausrichtung 1 -1
## 11 ost_fh blind 1 -2
## 12 ost_fh blindheit 1 -1
## 13 ost_fh dringend 1 -1
## 14 ost_fh entscheiden 1 -1
## 15 ost_fh epilepsie 1 -1
## 16 ost_fh fesseln 1 -1
## 17 ost_fh gegenstand 1 -1
## 18 ost_fh gegner 1 -2
## 19 ost_fh gewalt 1 -1
## 20 ost_fh hype 1 -1
## [1] "Positive words for: supsi_ch in it"
## # A tibble: 20 × 4
## # Groups: university [1]
## university word freq sentiment
## <chr> <chr> <int> <dbl>
## 1 supsi_ch supsi 959 0
## 2 supsi_ch supsinews 152 0
## 3 supsi_ch bachelor 147 0
## 4 supsi_ch formazione 146 0
## 5 supsi_ch info 138 0
## 6 supsi_ch studenti 136 0
## 7 supsi_ch supsievent 136 0
## 8 supsi_ch master 129 0
## 9 supsi_ch lugano 125 0
## 10 supsi_ch progetto 125 0
## 11 supsi_ch nuovo 115 0
## 12 supsi_ch ticino 115 0
## 13 supsi_ch iscrizioni 114 0
## 14 supsi_ch dipartimento 102 0
## 15 supsi_ch oggi 102 0
## 16 supsi_ch ricerca 100 1
## 17 supsi_ch informazioni 97 1
## 18 supsi_ch right 95 0
## 19 supsi_ch scopri 90 0
## 20 supsi_ch svizzera 90 0
## [1] "Negative words for: supsi_ch in it"
## # A tibble: 20 × 4
## # Groups: university [1]
## university word freq sentiment
## <chr> <chr> <int> <dbl>
## 1 supsi_ch incontro 17 -1
## 2 supsi_ch partire 13 -1
## 3 supsi_ch partenza 12 -1
## 4 supsi_ch emergenza 11 -1
## 5 supsi_ch crisi 10 -1
## 6 supsi_ch sfida 10 -2
## 7 supsi_ch campagna 8 -1
## 8 supsi_ch discutere 6 -1
## 9 supsi_ch gruppo 6 -1
## 10 supsi_ch rischio 6 -1
## 11 supsi_ch disciplina 5 -1
## 12 supsi_ch giovanni 5 -1
## 13 supsi_ch intervento 5 -1
## 14 supsi_ch conseguenze 4 -1
## 15 supsi_ch disagio 4 -3
## 16 supsi_ch fondo 4 -1
## 17 supsi_ch gioco 4 -1
## 18 supsi_ch parole 4 -1
## 19 supsi_ch perdere 4 -2
## 20 supsi_ch periodo 4 -1
The analysis indicates that Swiss Universities of Applied Sciences exhibit diverse tweeting patterns in terms of content, style, and emotions. Tweets often focus on academic achievements, projects, and institutional news, with varying emotional tones across different universities.
# Language Analysis
tweets %>%
filter(university == "bfh") %>%
count(lang) %>%
arrange(desc(n))
## # A tibble: 9 × 3
## # Groups: university [1]
## university lang n
## <chr> <chr> <int>
## 1 bfh de 3008
## 2 bfh en 135
## 3 bfh fr 35
## 4 bfh qam 8
## 5 bfh da 2
## 6 bfh es 2
## 7 bfh lt 2
## 8 bfh it 1
## 9 bfh zxx 1
# Pie chart of langugaes
tweets %>%
filter(university == "bfh") %>%
count(lang) %>%
ggplot(aes(x = "", y = n, fill = lang)) +
geom_bar(width = 1, stat = "identity") +
coord_polar("y") +
labs(title = "Language Distribution of Tweets by BFH")
# Emoji Analysis
emoji_count <- tweets %>%
filter(university == "bfh") %>%
top_n_emojis(full_text)
print(emoji_count)
## # A tibble: 20 × 4
## emoji_name unicode emoji_category n
## <chr> <chr> <chr> <int>
## 1 backhand_index_pointing_right 👉 People & Body 49
## 2 battery 🔋 Objects 16
## 3 backhand_index_pointing_down 👇 People & Body 12
## 4 woman 👩 People & Body 12
## 5 palm_tree 🌴 Animals & Nature 11
## 6 bulb 💡 Objects 10
## 7 computer 💻 Objects 10
## 8 evergreen_tree 🌲 Animals & Nature 10
## 9 graduation_cap 🎓 Objects 10
## 10 party_popper 🎉 Activities 10
## 11 robot 🤖 Smileys & Emotion 10
## 12 rocket 🚀 Travel & Places 10
## 13 clap 👏 People & Body 9
## 14 coconut 🥥 Food & Drink 9
## 15 date 📅 Objects 9
## 16 deciduous_tree 🌳 Animals & Nature 9
## 17 flag_Switzerland 🇨🇭 Flags 9
## 18 automobile 🚗 Travel & Places 8
## 19 clinking_glasses 🥂 Food & Drink 8
## 20 seedling 🌱 Animals & Nature 8
# Sum of all emojis used
print(sum(emoji_count$n))
## [1] 239
emoji_count %>%
mutate(emoji_name = reorder(emoji_name, n)) %>%
ggplot(aes(n, emoji_name)) +
geom_col(fill = "#37556E") +
labs(
x = "Count",
y = "Emoji",
title = "Top 20 Emojis Used by BFH"
) +
theme_minimal()
heatmap_data_bfh <- tweets %>%
filter(university == "bfh") %>%
count(day, timeofday_hour) %>%
complete(day, timeofday_hour, fill = list(n = 0))
# Show the pattern of the post time by day and hour. You can see clearly the working hours which are the time where the most tweets are posted.
ggplot(heatmap_data_bfh, aes(x = day, y = timeofday_hour, fill = n)) +
geom_tile(color = "white") +
scale_fill_gradient(low = "lightblue", high = "darkblue") +
labs(title = "Heatmap of Tweet Activity", x = "Day", y = "Hour") +
theme_minimal()
engagement_hour_bfh <- tweets %>%
filter(university == "bfh") %>%
group_by(timeofday_hour) %>%
summarise(avg_engagement = mean(weighted_engagement, na.rm = TRUE))
# When we look at the engagement by hour, we can see that the most engagement is around 4pm until 8pm. There is also a slight peak in the midday
ggplot(engagement_hour, aes(x = timeofday_hour, y = avg_engagement)) +
geom_bar(stat = "identity", fill = "#37556E") +
labs(title = "Average Engagement by Hour", x = "Hour", y = "Avg Engagement") +
theme_minimal()
plot_data_bfh <- tweets_filtered %>%
filter(university == "bfh") %>%
group_by(month) %>%
summarize(mean_sentiment = mean(sentiment, na.rm = TRUE))
ggplot(plot_data_bfh, aes(x = month, y = mean_sentiment)) +
geom_line() +
geom_smooth(method = "lm", se = FALSE, color = "red") +
labs(
title = "Mean Syuzhet Sentiment Over Time by BFH",
y = "Mean Sentiment Score",
x = "Month"
) +
theme_minimal()
insights <- list(
"Most Active Hours" = hours_with_most_tweets_by_uni,
"Most Active Days" = days_with_most_tweets_by_uni,
"Content Analysis" = head(words_freqs_de),
"Sentiment Analysis" = head(tweets_filtered$sentiment)
)
Based on the analysis, the following recommendations and key insights can be made to enhance BFH’s communication strategy:
Implement a dashboard to track tweet performance, including engagement metrics, sentiment scores, and trending topics. This allows adjusting the tweet schedule and audience targeting in real-time. Monthly reports could also be sufficient for strategic adjustments.
Develop a content calendar that aligns tweet releases with peak engagement times and days. This helps plan tweets in advance and ensures they are posted at optimal times.
Establish a feedback loop where the communication team reviews analytics data and adjusts strategies according to the findings. Regular team meetings can help discuss insights and support a data-driven approach to communication.
By integrating these recommendations and tools, BFH can enhance its communication strategy, ensuring that its messages are timely, relevant, and engaging for its audience.